There is clearly something wrong with this. I am not sure whether it’s because the weights are not correct or if it’s because the average is not correct (I suspect the latter, given that we are estimating a fixed mean in a heterogeneous population).
In the original pubblication, they provide a heatmap of the top genes, based on the highest loadings with PC1–3. We should do the same with wPCA and see whether we get better results.
Here, we will repeat the analysis for high-coverage only and low-coverage only data.